Lempel-Ziv Factorization Using Less Time & Space

نویسندگان

  • Gang Chen
  • Simon J. Puglisi
  • William F. Smyth
چکیده

For 30 years the Lempel-Ziv factorization LZx of a string x = x[1..n] has been a fundamental data structure of string processing, especially valuable for string compression and for computing all the repetitions (runs) in x. Traditionally the standard method for computing LZx was based on Θ(n)-time (or, depending on the measure used, O(n log n)-time) processing of the suffix tree STx of x. Recently Abouelhoda et al. proposed an efficient Lempel-Ziv factorization algorithm based on an “enhanced” suffix array — that is, a suffix array SAx together with supporting data structures, principally an “interval tree”. In this paper we introduce a collection of fast spaceefficient algorithms for LZ factorization, also based on suffix arrays, that in theory as well as in many practical circumstances are superior to those previously proposed; one family out of this collection achieves true Θ(n)-time alphabet-independent processing in the worst case by avoiding tree structures altogether. Mathematics Subject Classification (2000). Nonnumerical Algorithms 68W05.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Faster Compact On-Line Lempel-Ziv Factorization

We present a new on-line algorithm for computing the Lempel-Ziv factorization of a string that runs in O(N logN) time and uses only O(N log σ) bits of working space, where N is the length of the string and σ is the size of the alphabet. This is a notable improvement compared to the performance of previous on-line algorithms using the same order of working space but running in either O(N log3 N)...

متن کامل

Computing Reversed Lempel-Ziv Factorization Online

Kolpakov and Kucherov proposed a variant of the Lempel-Ziv factorization, called the reversed Lempel-Ziv (RLZ) factorization (Theoretical Computer Science, 410(51):5365–5373, 2009). In this paper, we present an on-line algorithm that computes the RLZ factorization of a given string w of length n in O(n log n) time using O(n log σ) bits of space, where σ ≤ n is the alphabet size. Also, we introd...

متن کامل

Linear Time Lempel-Ziv Factorization: Simple, Fast, Small

Computing the LZ factorization (or LZ77 parsing) of a string is a computational bottleneck in many diverse applications, including data compression, text indexing, and pattern discovery. We describe new linear time LZ factorization algorithms, some of which require only 2n log n + O(log n) bits of working space to factorize a string of length n. These are the most space efficient linear time al...

متن کامل

Space Efficient Linear Time Lempel-Ziv Factorization on Constant~Size~Alphabets

We present a new algorithm for computing the Lempel-Ziv Factorization (LZ77) of a given string of length N in linear time, that utilizes only N logN+O(1) bits of working space, i.e., a single integer array, for constant size integer alphabets. This greatly improves the previous best space requirement for linear time LZ77 factorization (Kärkkäinen et al. CPM 2013), which requires two integer arr...

متن کامل

Lempel Ziv Computation in Small Space (LZ-CISS)

For both the Lempel Ziv 77and 78-factorization we propose algorithms generating the respective factorization using (1 + ǫ)n lg n+O(n) bits (for any positive constant ǫ ≤ 1) working space (including the space for the output) for any text of size n over an integer alphabet in O ( n/ǫ )

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Mathematics in Computer Science

دوره 1  شماره 

صفحات  -

تاریخ انتشار 2008